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This paper introduces two new closely related betweenness centrality measures based on the Ran¬ 
domized Shortest Paths (RSP) framework, which fill a gap between traditional network centrality 
measures based on shortest paths and more recent methods considering random walks or current 
flows. The framework defines Boltzmann probability distributions over paths of the network which 
focus on the shortest paths, but also take into account longer paths depending on an inverse tem¬ 
perature parameter. RSP’s have previously proven to be useful in defining distance measures on 
networks. In this work we study their utility in quantifying the importance of the nodes of a net¬ 
work. The proposed RSP betweenness centralities combine, in an optimal way, the ideas of using 
the shortest and purely random paths for analysing the roles of network nodes, avoiding issues in¬ 
volving these two paradigms. We present the derivations of these measures and how they can be 
computed in an efficient way. In addition, we show with real world examples the potential of the 
RSP betweenness centralities in identifying interesting nodes of a network that more traditional 
methods might fail to notice. 


I. Introduction 

One of the most fundamental and popular topics in net¬ 
work science is determining the centrality of a node in a 
network according to the structure of the network. The 
concept of centrality can be interpreted in many ways and 
a vast number of measures have been proposed based on 
different interpretations. One commonly used interpre¬ 
tation is betweenness centrality, which reflects the extent 
to which a node lies in between pairs or groups of other 
nodes of the graph. This can be also stated as the ex¬ 
tent to which a node is an intermediate in communication 
over the network. Different models have been proposed 
to measure the participation of a node in this commu¬ 
nication ranging from the shortest path betweenness cen¬ 
trality of Freeman m, which considers communication 
flowing only along the shortest paths, to the current flow 
betweenness centrality [HE], which interprets communi¬ 
cation flowing as electric current or as random walks in 
the network. 

In this article we propose two families of betweenness 
centrality measures based on the Randomized Shortest 
Paths (RSP) framework [5HH]- The framework is based 
on Boltzmann probability distributions over paths be¬ 
tween the nodes of a network which focus on short, opti¬ 
mal paths, but give some probability mass also to longer 
paths. The extent of focus on optimal paths is controlled 
by an inverse temperature parameter f. The RSP frame¬ 
work has previously been shown to function well when 
defining distance measures on networks for clustering and 
classification of network nodes [S]. In this work we ex¬ 
tend the study of RSP’s by showing their potential also in 
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defining centrality measures. The two RSP betweenness 
centralities presented in this paper measure the involve¬ 
ment of each node in RSP’s between the nodes of the 
graph. The first measure, which we call the simple RSP 
betweenness, measures the expected number of visits to 
a node during RSP’s, while the second one, called the 
RSP net betweenness is the sum of expected net flows 
over the edges connected to a node. 

The proposed RSP betweenness measures are attractive 
both theoretically as well as in practice. Theoretical in¬ 
terest is ensured by the fact that both measures can be 
seen as generalizations of classical betweenness measures. 
With large values of the parameter (3, both RSP between¬ 
ness measures converge to a measure that we introduce 
as the shortest path likelihood betweenness, which is very 
closely related to the original betweenness centrality de¬ 
fined by Freeman m and its other similar variant, the 
load centrality [S]. The reason for defining two different 
betweenness measures with RSP’s is in their behaviour 
as /? is decreased. Namely, the simple RSP betweenness 
then converges to the stationary distribution of a random 
walk on the network (multiplied by a constant), whereas 
the RSP net betweenness converges to the current flow 
betweenness The definition, as well as the compu¬ 

tation, of the simple RSP betweenness are more straight¬ 
forward than for its counterpart, the RSP net between¬ 
ness, which can also be stated of their corresponding limit 
functions. In addition, the experiments in Section[V|indi- 
cate that the simple RSP betweenness can in practice be 
more useful than the RSP net betweenness or the current 
flow betweenness. However, the choice of which defini¬ 
tion to rely on in the end depends on the application 
domain. 

Considering betweenness based on RSP’s is motivated by 
the fact that measures based only on shortest paths or 
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on random walks alone often involve undesirable features. 
Shortest paths in a complex network tend to pass through 
only a small fraction of the nodes of the network, which 
can cause highly skewed betweenness score distributions 
and fail in differentiating between the other nodes of the 
network mm- Also, when considering communication 
or navigation in networks, it is not always even realistic 
to consider that they would occur along only the short¬ 
est, nor completely random paths. Instead, movement 
may follow a partly random route, with a drift towards a 
destination, for example when the navigating agent does 
not know the optimal way or wants to add secrecy and 
unpredictability to its route. As a result, the trajectories 
that are actually used in the network can be more spread 
over different nodes in areas of the network that contain 
many connections. Because of this, RSP’s can help, for 
instance, in detecting bottlenecks of the network, where 
there exist no alternatives for the shortest path. 

Measures based solely on random walks do take into 
account the abundance of connections between nodes. 
However, they may in many situations depend heavily on 
local features of a graph, especially for large graphs ffn- 
[ 13 ] instead of capturing its interesting global properties. 
Thus, some regularisation over the degree of randomness 
is needed, which in the RSP framework is controlled by 
the inverse temperature parameter /3. 

Models that find a compromise between the optimal 
shortest path and a random walk have recently received 
a lot of attention. Compared to these, the attractive 
aspect in using the RSP framework is that the compro¬ 
mise between the shortest and random paths is optimal 
by definition, as the Boltzmann distribution minimizes 
the expected cost of paths subject to a fixed relative en¬ 
tropy [31|7|. The minimization can also be expressed with 
respect to free energy [Sj. In addition to the optimality 
aspect, the computation of quantities related to RSP’s is 
fairly straightforward and efficient. A drawback of the 
algorithms presented in the paper is that as such they 
are not tractable with very large networks. However, in 
the future we plan to develop more specialized methods 
that will enable the RSP quantities to be computed with 
large networks as well. 

Ideas similar to the RSP framework of interpolating be¬ 
tween the two extremes can be found in the work of 
Alamgir and von Luxburg [a, involving p-resistances 
and graph node distances based on them, of Chebotarev 
involving distances based on the matrix forest theorem 
[ISl HD: of Zhang and Boley HZ! with focus on rout¬ 
ing schemes, of Estrada, who has defined the subgraph 
centrality m and the communicability betweenness [Hi- 
Even more related to this paper are the recent works of 
Bavaud and Guex [5D] and Lebichot et al. m- In fact, 
the two betweenness centrality measures presented in this 
paper can be shown equal to the betweenness measures 
proposed by Bavaud and Guex [20] , although the relation 
between the two works is not entirely obvious. We will 
discuss this relation and the relation between the RSP 


betweenness measures and other previously proposed be¬ 
tweenness measures in more detail in Section M 

To sum up, the contributions of this paper are: 

• We define two betweenness centrality measures 
which form a spectrum between measures based on 
shortest paths and pure random walks, therefore 
integrating information about both the optimality 
and the abundance of paths between nodes of the 
network, 

• We derive algorithms for computing these measures 
in a convenient and efficient way, and 

• We demonstrate with example networks that the 
proposed RSP betweenness measures may provide 
a more interesting ranking of the network nodes 
than the classical measures that they generalize. 

The structure of the paper is as follows: Section |ll] intro¬ 
duces the notation and defines the terms as used in the 
paper. Section |HI| lists existing network centrality mea¬ 
sures with a focus on different interpretations of between¬ 
ness centrality. Section |IV| reviews the RSP framework 
and then introduces the RSP betweenness centrality mea¬ 
sures and the methodology for computing them. Section 
[V] presents example cases that illustrate the benefits of 
using RSP betweenness measures. 

II. Notation and terminology 

In the paper we consider weighted directed graphs G = 
{V,E) with node set V = {1,2, ...,n} and edge set 
E = of m edges. We define a path, or walk, 

interchangeably, as a sequence of nodes p = (jq, ..., fr), 
where T > 0 and (v, ir+i) S E for all r = 0,..., T — 1. 
A path is absorbing, if the last node of the path appears 
on the path only once. We denote the set of all absorb¬ 
ing paths starting from node s and ending in node t, i.e. 
s-t-paths or s-t-walks, by Vst- 

The weights on edges, S E, reflect the simi¬ 

larity or strength of connection between adjacent nodes 
and form the adjacency matrix A of the graph. The edge 
weights define the reference transition probabilities of the 
unbiased random walk as plf = Oij / X]j o,ij ■ The transi¬ 
tion probabilities form the reference transition probability 
matrix which can be computed as = D^^A, 

where D is the diagonal matrix containing the row sums 
of A. The reference path probability Pg®^(p) of a path 
p € Vst is simply defined as the product of the transition 
probabilities along the path. 

In addition to the weights, the edges are also assigned 
costs Cij, which, in contrast to weights, can be considered 
also as the dissimilarity or distance of adjacent nodes. 
The cost of a path p is then simply defined as c(p) = 
Accordingly, we use the term shortest path 
to mean the path between two nodes with the lowest 
cost over all paths between the nodes. We denote the set 
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of shortest paths from node s to node t by the total 
number of such paths by |7^*j | and the cost of the shortest 
path from s to t by c*j. The directed graph that consists 
only of the nodes and directed edges that belong to the 
shortest paths from s to t is called the directed shortest 
path graph from s to t. 

In many situations the edge costs and edge weights can 
be defined based on one another, for example, as recip¬ 
rocals Cij = Ifaij, which corresponds to the interpreta¬ 
tion of costs as resistances and weights as conductances 
in an electric circuit. This convention is also normally 
used when computing centrality measures on weighted 
networks. However, in general in the RSP framework 
the weights can be independent of the costs, as long as 
Cij < oo whenever atj > 0. Accordingly, the transi¬ 
tion probabilities and thus the unbiased random walk 
can be independent of the costs. This means that the 
edge costs define the interpretation of shortest paths, i.e. 
the low temperature behavior of the system, whereas the 
edge weights determine the interpretation of a random 
walk, i.e. the high temperature behavior. The interplay 
between weights and costs is thus similar to a trade¬ 
off between exploration, based on local possible move¬ 
ments, and exploitation, based on long-term preferred 
movements. 


III. Graph node centrality 


ble node pairs, when considering shortest paths between 
nodes. However, in this paper we leave the normaliza¬ 
tion out of consideration in all the definitions, because 
it never affects the rankings of nodes within a strongly 
connected network. 

A. Betweenness centralities 

1. Betweenness based on shortest paths 


Possibly the best-known centrality measure of all is the 
original betweenness centrality of Freeman m, which 
counts the fraction of shortest paths between a pair of 
nodes that an intermediate node lies on and sums these 
fractions over all node pairs. We will also refer to it as 
shortest path betweenness, for specificity. Formally, the 
shortest path betweenness centrality of node i can be 
expressed as 


S^t — 1 


V*t) 


( 1 ) 


where n e Vli) means the number of paths that contain 
node i. Notice that if there are more than one shortest 
path connecting s to t, each of these paths will contribute 
a score of l/l'PstI to the betweenness of the nodes on 
them. 


The concept of graph node centrality has many interpre¬ 
tations, and for most of the interpretations there exists a 
lengthy catalog of different proposed measures, which are 
often derived for different application purposes. In addi¬ 
tion there have been efforts of stating axioms that a cen¬ 
trality measure should have |21112S]- Also, as discussed 
by Kolaczyk [24], there have been attempts to define a 
typology of centrality measures, for instance by Borgatti 
[25| . In an interesting recent work, Brandes and Hilden- 
brand study the problem of finding minimal graphs for 
which different centrality measures rank different nodes 
as the most central [55] ■ In this section, we make a brief 
survey of the different centrality measures with special 
focus on betweenness centrality measures. 

There are different possibilities in whether or not the 
source and target nodes, s and t should be considered 
also as intermediate nodes of a path for centrality con¬ 
siderations. In this paper we use the convention where 
the first and last node of a path do affect the between¬ 
ness scores. For betweenness measures based on shortest 
paths this choice only changes the overall betweenness 
scores by an additive constant in a strongly connected 
graph, and thus only affects the ranking of nodes when 
the network has several components. In contrast, when 
measuring betweenness based on random walks, further 
visits to the starting node s after the first step may in¬ 
crease the betweenness score of node s and thus may 
also affect the rankings. Betweenness measures are also 
often normalized, e.g., according to the number of possi¬ 


There are several variations of the shortest path between¬ 
ness defined above. A thorough review of these variants 
and their efficient computation was provided by Brandes 
[27] . One variant, called the load centrality IS EH EH], 
replaces the fractional term inside the sum in 0 with 
the branching probability of a path, i.e. the probabil¬ 
ity that a random walker moving in the directed short¬ 
est path graph from s to t follows a path that contains 
node i, when at each branching point in this graph it 
selects the edge to follow with uniform probability over 
all outgoing edges. For illustration of the difference be¬ 
tween the shortest path betweenness and load between¬ 
ness, see m- With weighted graphs the two measures 
are usually equal, as there often exists a unique short¬ 
est path between all nodes, especially if the weights are 
real-valued. 

In what follows, we will need to consider another variant 
of the shortest path betweenness, which we have not en¬ 
countered in the literature previously. We call this mea¬ 
sure the shortest path likelihood betweenness. Similarly to 
the load centrality, we define the likelihood betweenness 
by replacing the term inside the sum in Q with the nor¬ 
malized likelihood of a shortest path containing node i. 
The likelihood of a shortest path p* G Vst is the product 
of the reference transition probabilities along that path, 
i.e. the same as its reference path probability Plfip*)- 
Note, that the likelihood is different from the branch¬ 
ing probability which is based on transition probabilities 
in the directed shortest path graph from s to t, instead 
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of the whole graph. The normalized likelihood of p* is 
then obtained by normalizing by the sum of likelihoods 
of all shortest paths, Pst^(p)i which is 

also the contribution of p* to the shortest path likeli¬ 
hood betweenness of all the nodes along it. The shortest 
path likelihood betweenness is introduced here for the 
sake of completeness because it is the limiting function 
of both RSP betweenness measures presented in this pa¬ 
per. However, the differences between the three variants 
of shortest path betweenness presented above are very 
small, and in practice they provide very similar rankings 
of the nodes of a network. 


S. Betweenness based on random walks 

Betweenness has also been considered with respect to 
other than just shortest paths, namely by considering 
random walks or flows on a network. The first such mea¬ 
sure was proposed by Freeman |29j , who defined the flow 
betweenness centrality as the amount of flow through a 
node over maximum flows between all node pairs. The 
idea of considering flows was developed further by New¬ 
man [1], who defined the current flow betweenness cen¬ 
trality, which measures the centrality of a node as the to¬ 
tal sum of electrical current that flows through it, when 
considering all node pairs as source-sink pairs of a unit 
current flow. The current flow betweenness was also 
coined the random walk betweenness centrality because 
of the well-established connection between electric cur¬ 
rent flows and random walks m- Indeed, it can also be 
interpreted as the sum of expected net flows of a random 
walk over the edges connected to a node, meaning that 
the number of times that the walk enters or leaves the 
node along an edge cancel each other out. Brandes and 
Fleischer [5] developed an algorithm which improves the 
efficiency of computing the current flow betweenness for 
all nodes of a network. The properties and computation 
of the current flow betweenness have also been studied 
by Bozzo and Franceschet |31) . 

Instead of considering net flows, a more straightforward 
definition of betweenness based on random walks is sim¬ 
ply the overall expected number of visits to a node dur¬ 
ing a random walk. For arbitrary, non-absorbing walks, 
this quantity is not well-defined. However, it is possi¬ 
ble to compute the proportion of steps of such a walk 
that the walker spends in a node. This is equal to the 
probability of finding the walker at the node after a long 
walk, i.e. the stationary probability of the node. The sta¬ 
tionary probabilities define the stationary distribution, 
which is the unique vector tt that satisfies the equation 
TT = (P‘'®^)^7r, given that the network is strongly con¬ 
nected and aperiodic. It is well known that for such 
graphs the stationary probabilities are proportional to 
the recurrence times and, if the graph is undirected, to 
the degree centralities (or strength, for weighted graphs) 


|32) . For other graphs, the stationary distribution is not 
necessarily unique or may not even exist. To overcome 
this issue, for instance, the PageRank algorithm [33] uses 
a teleportation probability which transforms any kind 
of a graph into a strongly connected approximation and 
computes the stationary distribution on the transformed 
graph. 

Even though the expected number of visits to a node is 
not well-defined for arbitrary walks, it is well-defined for 
absorbing walks. A bit surprisingly, we have discovered 
that when computed over all absorbing random walks 
on a strongly connected aperiodic directed graph, this 
measure is equal to the stationary distribution, up to a 
multiplying constant. The multiplying constant is the 
sum, over all s-t-pairs of the graph, of average hitting 
times, i.e. the expected number of steps along an absorb¬ 
ing path from s to t, denoted by This result is 

not completely obvious and we have not found it men¬ 
tioned explicitly in the literature. The proof of the result 
is omitted from the paper, as it is not relevant for this 
work, although the result itself is relevant for considering 
the RSP-based betweenness measures. 

The RSP betweenness centrality measures proposed in 
this paper are also based on random walks, more par¬ 
ticularly on the RSP framework, which we will review 
in more detail in Section IIV Al The first betweenness 
measure is based on counting the expected number of 
passages through a node of a random walker moving ac¬ 
cording to the RSP probabilities, while the other com¬ 
putes the expected net flow of walkers going through the 
node. 

In fact, the RSP betweenness measures coincide with be¬ 
tweenness measures proposed recently, independently of 
our work, by Bavaud and Guex [30|. They propose a 
framework which interpolates between shortest paths and 
random walks by the minimization of free energy in a 
similar fashion as the free energy derivation of the RSP 
framework of Kivimaki et al. |B]. One main difference 
in the work of Bavaud and Guex [20], compared to the 
derivation of RSP’s, is that a more general form of en¬ 
ergy functionals, besides the expected length (or cost) of 
a path, is considered. In addition to that the relative 
entropy is considered with respect to transition probabil¬ 
ities instead of path probabilities, as is done in the defini¬ 
tion of RSP’s. The path distribution derived by Bavaud 
and Guex |2()] can, however, be shown to equal the path 
distribution defined in the RSP framework, although this 
requires some lengthy and uninteresting derivations and 
is left out of this paper. The relation between the two 
approaches has also been studied by in the more re¬ 
cent work of Guex and Bavaud [34]. Compared to the 
work of Bavaud and Guex |20] , our work focuses more on 
the computational and practical aspects of the method¬ 
ology. Thanks to the recent developments in the RSP 
framework [8|, we are also able to present efficient al¬ 
gorithms for computing the quantities in question and 
illustrate the use of the methodology with practical ex- 
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amples. 

Also closely related to the idea behind the RSP between¬ 
ness centralities is the bag-of-paths (BoP) betweenness 
centrality |21j . It is based on the BoP framework, which 
also defines a Boltzmann distribution on the paths of a 
graph in a similar way as the RSP framework. The BoP 
betweenness is then defined as the a posteriori probability 
that a path selected according to the Boltzmann distri¬ 
bution visits an intermediate node i, when it walks from 
node s to node t according to the BoP probabilities. The 
BoP betweenness is also defined for groups and used for 
semi-supervised classification of graph nodes. A similar 
group betweenness measure can be defined from the RSP 
betweenness measures proposed here, but this is left out 
of the scope of this paper. The node classification task 
was also tackled by Devooght et al. [SS] by using a mod¬ 
ularity measure derived from the BoP framework. 

B. Other centralities 

Besides betweenness, centrality has also been character¬ 
ized with other additional terms such as closeness, feed¬ 
back and vitality, to name a few [36]. There have also 
been many efforts for stating axioms that would define 
centrality starting from the work of Sabidussi [12] and re¬ 
cently by Boldi and Vigna m- The concept of centrality 
can also be considered with respect to edges, instead of 
nodes. Although in this work we focus on centrality ac¬ 
cording to the betweenness interpretation, we nonetheless 
mention some of the other interpretations in this section, 
as the RSP framework could be combined with many of 
them too. 

Closeness centrality measures are based on some interpre¬ 
tation of overall proximity of a node to the other nodes 
of a network. The shortest path closeness centrality of 
node i is classically defined as Ci = l/X]J=i 

where A^- is the shortest path distance between nodes 
i and j [221138j . Considering communication on a net¬ 
work, betweenness centrality can be interpreted as the 
amount of control of a node, whereas closeness central¬ 
ity measures the efficiency of the communication of the 
node [a m\. Instead of using the shortest path dis¬ 
tance to define closeness centrality, other distances can 
be used too. For instance. White and Smyth [50] define 
Markov centrality by replacing the shortest path distance 
in the closeness centrality definition with the average hit¬ 
ting time of an unbiased random walker, i.e., essentially 
the (unsymmetrized) commute time distance. Related to 
that, Brandes and Fleischer [5] considered a current flow 
analogy of closeness centrality by replacing the distance 
between two nodes with their potential difference. They 
managed to show that the current flow closeness central¬ 
ity is equivalent to the information centrality defined by 
Stephenson and Zelen [H]. The equivalence has been 
confirmed with another proof by Bozzo and Franseschet 
m- One subject for future research will be to extend 


closeness centrality by replacing the shortest path dis¬ 
tance with RSP-based distance measures. 

Feedback centrality measures are in many ways related to 
random walk based centrality measures. The eigenvector 
centrality |42] is the archetype feedback measure and is 
based on the idea that the centrality of a node should 
be the sum of the centralities of its neighbors. The solu¬ 
tion of the formulation is the eigenvector of the adjacency 
matrix corresponding to the largest eigenvalue. The pre¬ 
viously mentioned PageRank [33] can also be considered 
in this sense and formulated as an eigenvector. The Katz 
centrality [33] is based on the same idea of feedback, but 
considers also longer dependencies than only the neigh¬ 
bors of a node. The effect of longer dependencies decays 
according to the distance between nodes. The subgraph 
centrality [H] and communicability betweenness US] are 
based on similar ideas using the matrix exponential of the 
adjacency matrix. The relationship between all the feed¬ 
back centrality measures listed above has been studied 
by Benzi and Klymko [33] . 

IV. Randomized shortest paths betweenness 
centralities 


In this section we introduce two betweenness measures 
based on the randomized shortest paths (RSP) frame¬ 
work [BHS]. These measures both generalize the short¬ 
est path likelihood betweenness measure defined in Sec¬ 
tion III A 1| In addition to that, one of the two mea¬ 
sures also generalizes the stationary distribution of the 
graph while the other generalizes the current flow be¬ 
tweenness centrality. The RSP framework has previously 
been used for defining distance measures between graph 
nodes, which has proven useful for many data analysis 
tasks such as clustering and classification of graph nodes 

PIH]. 


A. Randomized shortest paths 

In its core, the RSP framework [BHE] [45] is based on a 
probability distribution over paths between two nodes of 
a graph. The framework can also be formulated consid¬ 
ering all paths, containing non-absorbing paths, but in 
this work, for simplicity, we restrict the framework to 
absorbing paths. The RSP probability distribution can 
be defined through either minimization of expected cost 
subject to a relative entropy constraint, or minimization 
of free energy [a ED]. Here we recall the former defini¬ 
tion of minimization of expected cost, which is perhaps 
a more intuitive one. 

The RSP framework is based on the probability distribu¬ 
tion over the set Vat of absorbing s-t-walks for which the 
expected cost of the walks is minimal, when constrained 
with a fixed relative entropy with respect to the reference 
path probability distribution. Formally, we seek for the 
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solution to the following problem 


B. The simple RSP betweenness 


Miiiimize ^ Pst(p)c(p) 

P&Vst 


subject to 


j(p,4P^f) = Jo 

E Pst{p) = i 

P&Vst 


( 2 ) 


where Ps|^(p) is the reference path probability, c(p) the 
overall cost of path p and J(Pst||P™^) is the relative en¬ 
tropy, or Kullback-Leibler divergence, which is set to a 
desired level Jq. 

The solution of this minimization is a Boltzmann distri¬ 
bution (for details, see IS!?]): 


Pst(p) — 


Psf(p)exp(-/3c(p)) 


where 


peVs, 


Psf(p) exp(-/3c(p)) 


(3) 


(4) 


is the partition function of absorbing walks from s to 
t and the inverse temperature parameter j3 controls the 
divergence from the unbiased random walk probabilities. 
When applying the framework, the user is supposed to 
input the value for /3, instead of the relative entropy Jq. 
Low and high values of /3 correspond, respectively, to 
low and high values of Jq and, inversely, to high and low 
temperature. In other words, for high values of /3 (low 
temperature), the path distribution Pgt focuses on short¬ 
est paths, whereas for low values of (3 (high temperature) 
more random paths are also preferred. It is also possible 
to find the model corresponding to a particular value of 
Jo, for instance, with a binary search over different values 
of/3. 

The partition function Zgt plays an important role in the 
derivation of the computation of many quantities related 
to the RSP framework, as can be seen in the following. 
Concerning the computation of Zgt, we refer to earlier 
works of Frangoisse et al. [15] and Kivimaki et al. [S] 
which show that it can be expressed as 


^ _ ^st 

^st — ? 

Ztt 


(5) 


where Zst is the element (s, t) of the fundamental matrix 
of non-absorbing paths, defined as 

Z = (I-W)-\ with W = P™f oexp(-/3C), (6) 

where o and exp denote the element-wise matrix multi¬ 
plication and exponential, respectively. The matrix W, 
defined from the reference transition probability matrix 
and the cost matrix C, is a substochastic matrix 
and can be interpreted as defining a killed random walk. 
Consequently, the partition function Z^t can then be in¬ 
terpreted as the probability of a walker surviving the walk 
from s to t (see[Sll35] for details). 


We first define the simple RSP betweenness centrality of 
a node i with respect to absorbing paths from s to t, as 
the expected number of visits through i over all s-t-walks, 
denoted by fii{s,t), with respect to the RSP probabili¬ 
ties of Equation ([^. This can further be expressed based 
on the expected number of passages through edges leav¬ 
ing from node i over all s-t-walks, denoted by ri^j{s,t), 
as: 


betf®^(s,t) = ni{s. 


t)= 

j:(i,j)GE 


The function betf®^ can be useful for visualizing path 
distributions and for path planning tasks between two 
fixed nodes of the graph [13 113 . Moreover, it can be 
used to investigate how central an intermediate actor is 
with respect to, for instance, the communication between 
two other actors in a social network. 

We then define the overall simple RSP betweenness cen¬ 
trality of node i as the sum of contributions over all 
source-target pairs, on the graph: 


betr=EEbetr(-^^)- 


( 8 ) 


Next, we derive the method for computing this quantity 
in closed form. Let us denote by 77(1 —)■ j G p) the number 
of times that the edge (Jj) is traversed along the path 
p. Then, by writing out the expression of 77 ^( 5 ,^), the 
use of the partition function in the computation of the 
RSP quantities becomes evident: 


= E Pst{p)v{'i ^ j & p) 

pev 

EP'^^Ip) exp [-^c(p)] r]{i j € p) 
_ pep_ 

E [-/3c(p')] 

p' 6 P 

dc{p) 


dci 


EP‘'®^(p)exp[-/?c(p)] 
pgp_ 

E p''®^(p') [-/^c(p')] 

p'GP 

1 d log Zst 
P dcij 


(9) 


Thus, the expected number of transitions i ^ j over s-t- 
walks can be computed by differentiating the logarithm 
of the partition function, which is common knowledge 
in statistical physics (see, e.g.,[lH]). Note that § holds 
only if there exists a path from s to t. Otherwise, natu¬ 
rally, fiij{s,t) = 0. 

By combining Equations ([^ and the computation of 
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nij{s,t) can be written as: 


as 




1 dlogZst _ 1 dlog{zst/ztt) 

P dcij P dcij 

1 ( _1 dztt 

P \ ^st dCij 



Therefore, we need to compute dzstldctj, which can be 
achieved using matrix formalism. If we denote by the 
(n X l)-vector whose element i is 1 and others are 0, 
then 


dzst _ 9(eJZef) _ i9(eJ(I-W) ^et) 

dCiq 


= e, 


dCij 

-5(1-W)- 


dCij 

dci 


= -Pw^jZsiZjt 


-et = 


dCij 

OCii 


et 


Z et = -/3pt®fexp[-/3cij]eJZeieJZe; 


( 11 ) 


where we used 


5X- 


_ 5 a; 


5X 


[SU]). Equation (|l0|) can therefore be rewritten as 


Vrj{s,t) 


1 1 

ZftWijZjt 

Zst Ztt 




WijZjt 


( 12 ) 


Furthermore, the total flow transiting through node i, 
given that i ^ t, is 


i=i 




(13) 


The expression on the right-hand side of Equation (13) 
can be further simplified in the following way. We 
know that (I — W) ^(I — W) = I, which implies that 
Z(I—W) = I and therefore that Z = ZW-I-I, or element¬ 
wise, Zit = X)^=i where 6^ is the Kronecker 

delta. However, the term Su can be discarded, when con¬ 
sidering Equation (131, because when i = t, we anyway 
have (zsijzst — zulzu) = 0. Thus, Equation (13) simpli¬ 
fies to 


ni{s,t) 



(14) 


bet. 


RSP 


= 

S,t — 1 S,t — 1 


^si 

^st 







s,t—l 


1 

- 

4 


n 


Y 

s,t—l 


1 

Ztt 


ri ^ TL ^ 

= ^it-rZsi-ny^Zit — zu 


s,t—l 


t^l 


j^diag ^Z (Z • )^ Z^ — n diag (Z Diag (Z • 
diag (Z (Z^ -nDiag(Z"))^Z 


Z) . 

J I 

(15) 


where diag(X) and Diag(X) are, respectively, a column 
vector and a diagonal matrix containing the diagonal of 
X, X“ denotes the element-wise reciprocal matrix, i.e., 
xjj = l/xij and the superscript t denotes elements from 
the transposed matrix, i.e., zb = zji. The vector bet^®^ 
of all betweenness values is computed accordingly. 


The pseudocode for computing the simple R SP b etween- 
ness for all nodes is presented in Algorithm ^ In con¬ 
clusion, the simple RSP betweenness scores of all nodes 
can be computed by performing the matrix inversion 
Z = (I — W)“^ and then simple matrix operations ac¬ 


cording to Equation (15). Thus, the computational bot¬ 
tleneck of the algorithm is the matrix inversion, which, 
in general, has time complexity 0{rP) and space com¬ 
plexity 0{nP), because of which the method is currently 
not practical with very large networks. 


Algorithm 1 Computing the simple RSP betweenness vec¬ 
tor of a graph G. 

Input: 

- A directed strongly connected graph G with n nodes. 

- The n X n reference transition probability matrix 
(defined from the adjacency matrix as = D“^A) 

- The n X n non-negative cost matrix C 

- The inverse temperature parameter /3. 

Output: 

- The n X 1 simple RSP betweenness vector bet^®^. 

1. W ^ P''®^ o exp [-/3C] 

2. (I-W)"^ 

3. ^ ee"^ ^ Z 

4. bet'^^P ^ diag (Z(Z^ - n Diag(Z^))'^Z) 

5. return bet^®^ 


C. RSP net betweenness 

Instead of only considering the overall outgoing flow of 
random walkers, as in the definition of Equation ([^, it 


Finally, the simple RSP betweenness centrality (Equa¬ 
tion (§) can be computed with matrix manipulation 


The Matlab code for the algorithms presented in the paper, as 
well as the real world networks used in the experiments, are avail¬ 
able online at https://github.com/ikivimak/RSP-betweenness 

























may in some cases make more sense to compute the net 
outgoing flow [6], i.e. so that the outgoing and ingoing 
flows through one edge neutralize each other. This corre¬ 
sponds to the random walk interpretation of the current 
flow betweenness in undirected graphs laiaisT]. Accord¬ 
ing to this approach, we define the RSP net betweenness 
centrality of node i as 


betf 


RSPnet 


= 5151 51 \Vijis,t) -r]j,is,t)\ (16) 

s=l t=l 


The computation of this quantity for all nodes at once is 
a bit more involved than in the case of the simple RSP 
betweenness, because of the absolute value in the expres¬ 
sion. A naive algorithm would loop over all s-t-pairs and 
compute the contribution of the corresponding paths to 
each intermediate node. However, a more efficient solu¬ 
tion is to perform a loop over each edge of the network, 
compute separately the net flow through that edge over 
all s-t-paths, and to add this net flow to the between¬ 
ness score of the starting node of the edge. This change 
in looping strategy improves the complexity by a factor 
from 0{n^) to 0{m), which makes a big difference, when 
dealing with a sparse network. The faster computation 
can be achieved by writing out the matrix N*-^, whose 
element (s,t) is 77 ^( 5 , t) from Equation (12): 






W. 


Zst Ztt 

Z - e o zj -h diag(Z)) 


Wii 


(17) 


where o and denote elementwise matrix multiplication 
and division, respectively, and zj denote the (n x 1)- 
vectors corresponding, respectively, to the i-th column 
and j-th row of matrix Z and e is the (n x l)-vector 
whose all elements are 1. 

The overall net flow through edge {i,j) can then be com¬ 
puted as 


VT = E [ - N^*|e (18) 

s,i—1 

after which the betweenness score of each node can be 
simply computed by summing up the contributions of 
each edge connected to the node: 

bet“= ^ 77“‘. (19) 

j:{i,j)eE 

Algorithm [^contains the pseudocode for computing the 
RSP net betweenness. In principle, the algorithm can be 
used with directed graphs and the result can be inter¬ 
preted according to the net flow of random walks, even 
though the electric current interpretation only makes 
sense with undirected graphs. With undirected graphs. 
Algorithm should be altered so that it only considers 


each undirected edge only once and increments also the 
betweenness of node j in addition to node i at step 10. 
This reduces the computation time on undirected net¬ 
works by a half. Algorithm |^also contains the same ma¬ 
trix inversion as AlgorithmTb of time complexity 0(n^). 
In addition to this, the other consuming task is the loop 
over all m edges of the graph in steps 4.-II., inside which 
elementary matrix operations of time complexity 0{n^) 
have to be performed. Thus, the total time complexity 
of the algorithm is 0{n^ -I- mn^). 

Although the RSP net betweenness can be computed for 
directed graphs, we have not found a good use case for 
this purpose. The definition on directed graphs is not as 
intuitive as the simple betweenness and, moreover, cur¬ 
rent flows and the current flow betweenness are normally 
defined only for undirected graphs. Nevertheless, it is 
possible to use Algorithm with directed graphs, in ad¬ 
dition to which it is possible to derive a similar algorithm 
for computing the directed version of the current flow 
betweenness based on the pseudoinverse of the Laplacian 
matrix of the graph. 


Algorithm 2 The RSP net betweenness vector of a graph 
G. _ 

Input: 

Same as Algorithm 

Output: 

- The n X 1 RSP net betweenness vector 

1. W ^ P"®*'o exp [-/3C] 

2 . (I-W)"^ 

3. bet'*'®P®®* ^ 0 

4. for i = 1 to n do 

5. Z® i — ZOi, Z® i — Z'^'oi 

6. for all j such that {i,j) G E do 

7. z® •«— Zoj, Zj- Z'''ej 

8. ^ Wii [(z®(z5)T - Z) - e ((z® o zj) - diag(Z))^] 

9. ^ Wii [(z®(z0^ - Z) - e ((z® o z^) - diag(Z))^] 

10. betpP®®‘ ^ betpP"®‘ -b | N”-’' - | e 

11. end for 

12. end for 

13. return bet^®^"®‘ 


RSP betweenness centralities at the limit /3 —> 00 

The simple RSP betweenness centrality counts the ex¬ 
pected number of visits to each node during RSP’s be¬ 
tween all source-target pairs of the graph. In the low 
temperature limit, i.e. when /3 —> 00 , the RSP proba¬ 
bility distribution of Equation ([3 ) focuses solely on the 
shortest paths of the graph and the expected number of 
visits to a node on a shortest path approaches the proba¬ 
bility of following that particular path. Moreover, when 
/3 —>■ 00 , for all paths p S Vst, whose cost c(p) > c*j, 
exp(-/35(p)) < exp(-/3g*t), because of which the par¬ 
tition function Z^t of Equation (|^ becomes dominated 
by the terms determined by the shortest paths. As a 
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result, 


Pst(p) = 


Pst (p)exp(-/3c(p)) 


Psf{p) 


i:peP;,Psf(p) 


if P ^ Vtt 
if p e 


( 20 ) 


In other words, the RSP probability of a shortest path 
approaches the normalized likelihood of the path, which 
is also the contribution to the betweenness scores of the 
nodes along the path. Thus, the simple RSP betweenness 
converges to the shortest path likelihood betweenness de¬ 
fined in Section llll A II 


The same result holds for the RSP net betweenness. In¬ 
tuitively, as the path distribution focuses more and more 
on the shortest paths, one of the two terms in the net 
flow in Equation (16) becomes zero, as the walker will 
only move in one direction along each edge for a given s-t- 
pair. Thus, the RSP net betweenness also approaches the 
shortest path likelihood betweenness, as /3 —^ oo. 


RSP betweenness centralities at the limit /3 0^ 


Lower and Midtown Manhattan, which serves as an ex¬ 
ample of an undirected network. The second is a subset 
of the directed Wikipedia hyperlink network. 

All of the example cases presented here indicate benefits 
of using the RSP betweenness measures over the short¬ 
est path and random walk based betweenness measures. 
The benefits are clearer in the case of the simple RSP 
betweenness than the RSP net betweenness. The sim¬ 
ple RSP betweenness is also more preferable in terms of 
interpretability and computational efficiency compared 
to the net betweenness. However, the decision of which 
approach to use depends on the actual application and 
its premises. We have also experimented with several 
other types of graphs, including, for instance, examples 
presented by Brandes and Hildenbrand [55], which have 
been designed to differentiate centrality measures. How¬ 
ever, for these, and other simple cases, there is no clear 
difference in considering RSPs, but rather the ranking of 
nodes with intermediate values of /3 are in essence the 
same as with the limit values of /3. 

A. Overall betweenness of an in-between 
community 


In the high temperature limit, as /3 —>■ O’*", 
exp(—/3c(p)) —S' 1 for all p, and the RSP probabili¬ 
ties of Equation © converge to the unbiased random 
walk probabilities, determined by the reference transi¬ 
tion probabilities, i.e. P^t — > 0 ^ 0 + Ps?- This means that 
the simple RSP betweenness converges to the expected 
number of visits to a node over all absorbing walks with 
respect to the unbiased random walk probabilities. As 


presented in Section HI A 2 this measure is proportional 
to the stationary distribution, if the network is strongly 
connected and aperiodic, where the multiplicative factor 
is the sum of average hitting times between all s-t-pairs. 
Thus, as mentioned in Section jlll A2[ for undirected net¬ 
works, the measure becomes proportional to the degree 
(or strength) centrality. In the same limit, as /3 —O'*", 
the RSP net flow converges to the current flow between¬ 
ness centrality IHEI, as the edge flows 77 ^( 3 , t) converge 
to the potential differences of adjacent nodes. 


V. Experiments 


The interpolation between common centrality measures 
already makes the simple RSP and RSP net betweenness 
centralities interesting. Furthermore, there are also cases 
in which the RSP betweenness centralities can be more 
relevant than their limit functions, the shortest path like¬ 
lihood betweenness, the current flow betweenness and the 
stationary distribution. In this section these benefits will 
be illustrated, first with artificially generated networks, 
and later with two real networks of very different nature. 
The artificial examples show the behavior of the RSP 
betweenness measures in a network consisting of com¬ 
munities. The first real network is the street network of 


One possible use for the RSP betweenness measures is the 
detection of groups of nodes that are central in a network. 
Consider a network consisting of three disjoint commu¬ 
nities, A, B and C which are highly intraconnected but 
loosely interconnected. In addition, community B is con¬ 
nected to communities A and C, which, however, do not 
share any edges with each other. In other words, com¬ 
munity B is in between communities A and C and all 
paths between nodes of communities A and C have to go 
through community B. Such an organization is possible, 
for instance, in a hierarchical social network, where com¬ 
munity B could represent a directoral board of a com¬ 
pany. If, moreover, the graph is in general sufficiently 
sparse, then the shortest paths between communities A 
and C will run through only a few of the nodes of com¬ 
munity B. Thus, betweenness measures based on short¬ 
est paths will only highlight those nodes of community 
B, whereas the other nodes of community B will get no 
contribution from the connections between communities 
A and C. For some applications, however, the nodes of 
community B should, in general, be considered more in- 
between than the nodes of communities A and C. Defin¬ 
ing betweenness based on random walks, and especially 
RSP’s can help in this matter, as will be demonstrated 
next. 

We first consider a simple example of a regular graph with 
communities organized in the order described above. The 
example is depicted in Figure which shows a 5-regular 
graph containing three communities of 6 nodes, where 
one of the communities is connected to the other two. 
The graph has been constructed by considering three 
cliques of 6 nodes, and by removing and adding appro¬ 
priate links to obtain the desired structure. Figure [T] 
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contains the heat plots of the betweenness values of the 
nodes with the simple RSP betweenness (Figure [^b)) 
and its limit functions, i.e. the shortest path likelihood 
betweenness (Figure [^a)), which in this example equals 
the standard shortest path betweenness, and the station¬ 
ary distribution multiplied by the sum of average hitting 
times (see section “RSP betweenness centralities at the 
limit /3 —0+”), which - as the network is undirected - 
corresponds to the degree centrality (Figure [^c)) up to 
a scaling factor. 

The heat plots show that the simple RSP betweenness 
highlights the nodes of the central community more than 
its limit functions. The low temperature limit function, 
i.e. the shortest path likelihood betweenness highlights 
the nodes connecting the different communities, but the 
betweenness scores of the two other nodes in the central 
community are of the same magnitude as the scores of the 
other nodes in the peripheral communities. In the high 
temperature limit the simple RSP betweenness converges 
to the degree centrality, which is constant for all nodes, 
as the graph is regular. Although the heat plots show 
the actual betweenness scores, and not the rankings of 
the nodes according to them, the rankings also comply 
with the above findings. 

Using the RSP net betweenness (for which the results are 
not illustrated here), however, brings no benefit in this 
example, when compared to its limit functions. Namely, 
the current flow betweenness ranks the nodes of the cen¬ 
tral community a bit higher than the shortest path likeli¬ 
hood betweenness, but the RSP net betweenness does not 
increase those ranks with any intermediate values of 13. 
On the other hand, the current flow betweenness values 
of the central community nodes are relatively much lower 
than the simple RSP betweenness values of Figure [^b). 
However, the RSP net betweenness, can be beneficial in a 
similar setting, but with a bit of more complexity, which 
will be shown next. 

For a more complex example, we generate random net¬ 
works with the LFR algorithm of Lancichinetti et al. m 
designed to construct scale-free networks with a com¬ 
munity structure. This experiment confirms further the 
usefulness of the simple RSP, as well as the RSP net 
betweenness measure. We generated graphs consisting 
of three communities, A, B and C, and then simply re¬ 
moved the edges between two of the communities A and 
C. The size of the communities was set to 120 nodes, re¬ 
sulting in networks with 360 nodes, the average degree of 
the network was set to 10, the maximum degree to 120 
and the power-law exponent of the degree distribution to 
—2. We tested three different values of the mixing coef¬ 
ficient fj, = {0.01,0.05,0.1}, which essentially determines 
the probability of having an edge between two commu¬ 
nities after the degrees of nodes have been fixed. For 
each generated network, we computed the shortest path 
likelihood betweenness, the degree centrality, the current 
flow betweenness as well as the simple RSP and RSP net 
betweenness scores with several different values of the 



(b)Simple RSP, /3 = 0.01 



(c)Degree 


FIG. 1. A 5-regular graph with three communities, and the 
heat plots of its betweenness values with the shortest path 
likelihood betweenness (a), the simple RSP betweenness (b) 
and the degree centrality. Blue indicates low, and red high 
betweenneess values. 


parameter j3. We then rank the nodes according to each 
list of betweenness scores (with rank 1 naturally meaning 
the node with the highest score) and compute the aver¬ 
age rank of the nodes in the central community B. We 
repeat the graph generation and the computation of av¬ 
erage ranks 200 times and report the mean average rank 
of the nodes in the in-between community over these 200 
runs. 


The results are plotted in Figure which shows that in 
most cases both the simple RSP and the RSP net be¬ 
tweenness, with some intermediate values of /3, rank the 
nodes of the in-between community more central than 
their limiting functions. The plots show the mean av¬ 
erage rank, as well as the standard error of the mean, 
of the nodes of the central community B with different 
values of the inverse temperature /3. The values at the 
extremes of the plots correspond to the results obtained 
with the limiting functions, with the left end correspond¬ 
ing to the low-temperature (high (3) case, i.e. the shortest 
path likelihood betweenness in both plots, and the right 
end to the high-tem perat ure (low (3) case, i.e. the degree 
centrality in Figure |2(a)[ and the current flow between¬ 
ness Figure 2(b)| It is evident from the figures that with 
some intermediate value of (3 the RSP betweenness mea¬ 
sures in this setting often manage to rank the nodes of 
the central group B higher than the limit functions by 
taking into account other connections besides only the 
shortest ones. Note that this does not mean that the 
RSP betweenness rankings are lower than the rankings 
with the limit functions on each individual network, but 
that the result holds on average over the 200 generated 
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Mean average rank of commnnity B with simple RSP betweenness Mean average rank of community B with RSP net betweenness 



FIG. 2. The mean average rank ps of the nodes of community B which lies in between two other communities, A and C, based 
on the nodes’ RSP betweenness (a) and RSP net betweenness values (b) over 200 networks of 360 nodes generated using the 
LFR algorithm, as described in the body text (with low rank meaning a high betweenness score). The results are plotted for 
varying values of jS and for three values of the mixing parameter ^ with error bars indicating the standard error of the mean 
over the 200 runs. In both plots, the values at the left end of the curves (as P —^ oo) show the results with the shortest path 
likelihood betweenness and at the right end (as /3 —/ O'*") the results with the degree centrality in (a), and the current flow 
betweenness in (b). 


networks. The fluctuations of the rankings are indicated 
by the error bars showing the standard error of the mean 
over the 200 networks. 

B. Manhattan street network 

One promising application area for RSP’s are routing or 
path planning problems. RSP’s allow the modeling of 
routing in situations that include an element of random¬ 
ness, such as navigation of people or animals in an envi¬ 
ronment. On the other hand, by definition, RSP’s can be 
used for planning paths in an optimal way while keeping 
the predictability of the path at a desired level. RSP’s 
could also be used for avoiding congestion problems in 
transportation and traffic networks. 

We illustrate the use of RSP’s for routing in a network 
by analyzing the street network of Midtown and Lower 
Manhattan. We have extracted this network from Open- 
StreetMap[S2]. The nodes in the network correspond to 
intersections and the edges are the street segments be¬ 
tween the intersections. We treat the network as undi¬ 
rected, and analyze the network using both the simple 
RSP and the RSP net betweenness. 

The length of each street segment is assigned as the cost 
of the corresponding edge. Accordingly, the overall cost 
of a path is its overall length. However, we define here 
the reference transition probabilities of the random walk 
according to the degree of each node, = 1/di, i-e. 
only according to the number of edges connected to the 
node and independent of the edge costs. This seems like 
a reasonable choice for moving in a street network, if 
we consider that the decision of direction of the random 
walker in an intersection is not affected by the lengths 


of the street segments. Remember that this means that 
the shortest path likelihood betweenness on the graph is 
based on the edge costs, whereas the random walk based 
betweenness measures do not depend on the costs, but 
only the degrees of nodes. 


The heat plots of the simple RSP and the RSP net be¬ 
tweenness measures and their limit functions on the Man¬ 
hattan street network are depicted in Figure At low 
temperatures (large /3), both RSP based betweenness 
measures converge to the short est path likelihood be¬ 
tweenness, shown in Figure [3(^ Figures [3(b)[ |3(d)] and 
|3(f)| show the betweenness values obtained with the sim¬ 
ple RSP betweenness, and Figures [3(^ 3(e) and |3(g)| the 
values obtained with the RSP net betweenness with dif¬ 
ferent values of /3. As the network is undirected and con¬ 
nected, and because we use reference probabilities based 
only on degrees, instead of costs, the limit function of 
the simple RSP betweenness, when /3 —>■ 0+, is equal 
to the degree centrality multiplied by a constant, shown 
in Figure |3(h)[ Finally, the limit function of the RSP 
net betweenness is the current flow betweenness, which 


is presented in Figure 3(i) 


This example shows one strength of both of the RSP 
betweenness measures. It is evident from the plot that 
the shortest path betweenness is focused on Broadway, 
which functions as a diagonal shortcut in many routes 
on the grid-like Midtown. However, when (3 is decreased, 
both RSP based measures rank highest the intersections 
along the FDR Drive on the eastern shore. This is mainly 
caused by the sparsity of streets on the east shore close 
to the residential areas of Stuyvesant Town and Peter 
Cooper Village. As a result, the FDR Drive is a vital 
connection between the upper and lower eastern parts of 
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(c)RSP net, /3 = (e)/3 = IQ-^ (g)/3 = IQ-* (i)CFB 


FIG. 3. The interpolation between the shortest path and random walk betweenness measures with the simple RSP and the 
RSP net betwenness measures on the Midtown and Lower Manhattan street network. The network has been extracted from 
OpenStreetMap[^ (for the data and code, see Materials). 


the map. This aspect is not clear from the shortest path 
likelihood betweenness, but becomes apparent by com¬ 
puting the RSP-based betweenness values. The current 
flow betweenness also ranks high the intersections of FDR 
Drive, but as a drawback the importance of Broadway is 
not as apparent as it perhaps deserves. 

Thus, it seems that the RSP based betweenness mea¬ 
sures, by assuming suboptimal navigation between points 
in the network, can highlight bottlenecks such as the 
FDR Drive on the Manhattan network better than the 
deterministic shortest path betweenness or the unbiased 
current flow betweenness. Comparing between the two 
RSP betweenness, it is hard to find any major differences. 
However, in a street network, when considering the move¬ 
ment of people or vehicles, the simple RSP betweenness 
has a more sensible physical interpretation than the net 
betweenness. Moreover, again, the computation of the 
simple RSP betweenness is much more efficient and its 
interpretation clearer than the net betweenness, which 
make it a more preferable candidate. On the other hand, 
there may also appear applications, for which the net 
flow interpretation is more relevant, in which case the 
net betweenness should be used. 

C. A subnetwork of Wikipedia 

Here we illustrate the behavior of the simple RSP be¬ 
tweenness on a directed real network. The network is a 
subnetwork of the hyperlink network of Wikipedia. It 


consists of the Wikipedia page on Network Science and 
the pages that contain a hyperlink to it, or are linked to 
from it. We only consider the largest strongly connected 
component of this network which contains 151 nodes. We 
only report the results obtained with the simple RSP be¬ 
tweenness, and not the RSP net betweenness, because the 
net flow interpretation is not particularly suitable when 
studying the World Wide Web, and in this example it 
does not provide any interesting results. 


Figure|^shows the change in the rankings of the nodes of 
the Wikipedia subnetwork with the simple RSP between¬ 
ness centrality (Figure |4(b)[ ) as well as with its limiting 
functions, the shortest path likelihood betweenness (Fig¬ 
ure [4(^ and the stationary distribution (Figure 4(c) I. 
In addition to the heat plots, there are lists of the top 
ten nodes according to each betweenness centrality. The 
plots directly illustrate the general structure of the net¬ 
work, with a division to a tightly intra-connected clus¬ 
ter, appearing on the lower left corner of the plots, and a 
more sparsely connected peripheral part. The dense clus¬ 
ter comprises of nodes related mostly to social networks, 
whereas the other nodes correspond to more general con¬ 
cepts. 


The shortest path likelihood betweenness ranks quite 
high many nodes from both of the two groups of nodes, 
i.e. general nodes such as the seed node ’Network sci¬ 
ence’, whereas the stationary distribution focuses mostly 
on the dense cluster of nodes related to social networks, 
highlighting especially particular social networking web- 
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(a) Shortest path 

1. Network science 

2. Network theory 

3. Social network 

4. Graph theory 

5. Social network analysis 

6. PageRank 

7. Small-world network 

8. Sociology 

9. Small world experiment 

10. Orkut 



(b)RSP, /3 = 10-1 

1. Graph theory 

2. Social network 

3. Social network analysis 

4. Mathematics 

5. Sociology 

6. Network theory 

7. Gomputer science 

8. Social networking service 

9. PageRank 

10. Network science 



V.* 


(c)Stationary distribution 

1. Graph theory 

2. Social network 

3. Social networking service 

4. Myspace 

5. Social network analysis 

6. Pinterest 

7. Orkut 

8. Small world experiment 

9. Tumblr 

10. Social network analysis software 


FIG. 4. The interpolation between the shortest path likelihood betweenness the stationary distribution with the simple RSP 
betwenness on the subnetwork of Wikipedia. The color of a node in the plots indicates its rank w.r.t. the betwenness measure; 
red and blue indicate high and low rank (i.e. high and low betweenness values), respectively. The dense cluster of nodes in the 
lower left corner consists mostly of nodes related to social networks. Below the plots, the 10 highest-ranked nodes are listed. 


sites, such as ’Myspace’ and ’Pinterest’. Interestingly, 
even the seed node ’Network science’ is not on the top 
ten nodes according to the stationary distribution. Here, 
again, the simple RSP betweenness makes an interest¬ 
ing compromise between these two extremes by respect¬ 
ing the high connectivity of the social networks cluster 
while also highlighting important, general nodes, such as 
’Mathematics’ and ’Computer Science’ from the periph¬ 
eral group. 

This example indicates the potential of the simple RSP 
betweenness in analyzing and exploiting semantic and 
associative networks, as in |53j . Moreover, the example 
can help in applications of web design and marketing, 
for instance, considering situations where a user tries to 
find a certain page on a web site by following hyperlinks 
of that site. This can happen, for example by browsing 
videos on Youtube, or when playing the Wiki Game, in 
which the purpose is to find a target Wikipedia page 
from a starting page only by clicking the hyperlinks on 
the pages. 

VI. Conclusion 

We have presented two new graph node betweenness cen¬ 
trality measures based on Randomized Shortest Paths. 
The first measure, the simple RSP betweenness central¬ 
ity, counts the expected number of visits to a node, while 


the second, the RSP net betweenness, is based on the 
overall net flow over edges connected to a node. Both of 
these measures are parametrized generalizations of more 
traditional betweenness centrality measures. The RSP 
net betweenness and its high-temperature limit function, 
the current flow betweenness seem theoretically more 
elaborate than the simple RSP betweenness and its limit 
function, the stationary distribution. However, based on 
our experiments, the simple RSP betweenness seems to 
provide more satisfying and practical results than the net 
betweenness, in addition to which (and perhaps partly 
because) it is easier to interpret. In general, our experi¬ 
ments have shown that the RSP betweenness centralities 
can provide interesting insight into the role and impor¬ 
tance of the nodes in a network in ways that the more tra¬ 
ditional betweenness measures based on either shortest 
paths or unbiased random walks can not achieve. 

The RSP betweenness measures could be further com¬ 
pared with other centrality measures, which could be 
a subject for future work. However, the main purpose 
of this paper is only to focus on betweenness central¬ 
ity measures, to introduce the RSP based methods and 
their computation and to provide some examples where 
one could benefit from using them. Also, the paper only 
considers betweenness as a global measure on nodes, but 
the methods can easily be extended to other uses such as 
edge betweenness, betweenness w.r.t. a group of nodes or 
betweenness between groups of source and target nodes. 
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all of which have relevant applications. One drawback of 
the RSP framework is that it often requires a full ma¬ 
trix inverse, because of which it is not currently practical 
for very large networks. One topic for future research 
is to develop methods that would allow estimating the 
RSP-based quantities for large networks either by using 
more specialized computational methods or by approxi¬ 
mation. 

Although the computational complexity of the RSP- 
based methods can be too high for very large problems, 
the implementation and the interpretation of the com¬ 
putations is quite straightforward. Moreover, the frame¬ 
work lies on solid theoretical grounds, and considering 
the generalization of shortest paths by randomization 
makes sense for many application scenarios. In our ex¬ 


amples we have shown them to give promising results 
in highlighting nodes that belong to a central group of 
nodes, in detecting possible bottlenecks in street net¬ 
works for navigation modeling and also in evaluating the 
visit rate of pages on the World Wide Web. 
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